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CUTOFF FOR NON-BACKTRACKING RANDOM WALKS ON 
SPARSE RANDOM GRAPHS 

ANNA BEN-HAMOU AND JUSTIN SALEZ 

Abstract. A finite ergodic Markov chain is said to exhibit cutoff if its distance 
to stationarity remains close to 1 over a certain number of iterations and then 
abruptly drops to near 0 on a much shorter time scale. Discovered in the context 
of card shuffling (Aldous-Diaconis, 1986), this phenomenon is now believed to be 
rather typical among fast mixing Markov chains. Yet, establishing it rigorously 
often requires a challengingly detailed understanding of the underlying chain. 

Here we consider non-backtracking random walks on random graphs with a 
given degree sequence. Under a general sparsity condition, we establish the 
cutoff phenomenon, determine its precise window, and prove that the (suitably 
rescaled) cutoff profile approaches a remarkably simple, universal shape. 


1. Introduction 
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Figure 1. Distance to stationarity along time for the nbrw on a 
random graph with 10® degree 3—vertices and 10® degree 4—vertices 
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1.1. Setting. Given a finite set V and a function deg: V — )■ {2, 3, ...} such that 

(1) N := Y. deg(T) 

v&V 

is even, we construct a graph G with vertex set V and degrees (deg(n))^gy as 
follows. We form a set X by “attaching” deg(n) half-edges to each vertex v G V: 

X := n e V, 1 < z < deg(n)}. 

We then simply choose a pairing tt on X (i.e., an involution without hxed points), 
and interpret every pair of matched half-edges {a;,7r(a;)} as an edge between the 
corresponding vertices. Loops and multiple edges are allowed. 



Figure 2. A set of half-edges X, a pairing tt and the resulting graph G 

The non-hacktracking random walk (nbrw) on the graph G = G{7t) is a discrete¬ 
time Markov chain with state space X and transition matrix 

if y is a neighbour of 7r(a;) 
otherwise. 

In this dehnition and throughout the paper, two half-edges x = {u, i) and y = (n, j) 
are called neighbours ii u = v and i ^ j, and we let deg(a:) := deg(M) — 1 denote 
the number of neighbours of the half-edge x = (m,z). In words, the chain moves 
at every step from the current state a: to a uniformly chosen neighbour of Tii^x). 


1 


P{x,y) = 


deg(7r(3;)) 

0 








Figure 3. The non-backtracking moves from x (in red) 

Note that the matrix P is symmetric with respect to tt: for all x,y & X, 
(2) P{7r{y),7i{x)) = P{x,y). 
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In particular, P is doubly stochastic: the uniform law on X is invariant for the 
chain. The worst-case total-variation distance to equilibrium at time t eN is 

(3) V(t) := 

P> is non-increasing, and the number of transitions that have to be made before it 
falls below a given threshold 0 < e < 1 is known as the mixing time: 

tmixie) ■= inf {teN: V{t) < e} . 


1.2. Result. The present paper is concerned with the typical profile of the func¬ 
tion t I—)■ Pit) under the so-called configuration model, i.e. when the pairing tt 
is chosen uniformly at random among the (iV — 1)!! possible pairings of X. In 
order to study large-size asymptotics, we let the vertex set V and degree function 
deg: Id —)• N depend on an implicit parameter n eN, which we omit from the no¬ 
tation for convenience. The same convention applies to all related quantities, such 
as TV or df. All asymptotic statements are understood as n — )■ cx). Our interest is 
in the sparse regime, where the number N of half-edges diverges at a much faster 
rate than the maximum degree. Specihcally, we assume that 

(4) A := maxdeg(n) = 

vEV 

As the behaviour of the nbrw at degree-2 vertices is deterministic, we will also 
assume, without much loss of generality, that 

(5) mindeg(n) > 3. 

vGV 

Remarkably enough, the asymptotics in this regime depend on the degrees through 
two simple statistics: the mean logarithmic degree of an half-edge 

(6) ■= deg(T) log (deg(n) - 1), 

vGV 

and the corresponding variance 

(7) := deg(u) {log (deg(n) - 1) - . 

vGV 

We will also need some control on the third absolute moment: 

(8) ^ ■= deg(u) |log (deg(n) - 1) - fvf . 

vGV 

It might help the reader to think of fx, a and p as being hxed, or bounded away 
from 0 and oo. However, we only impose the following (much weaker) condition: 


(9) 


y (log log jv)^ 

fjfi log N 


and 
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Our main result states that on most graphs with degrees (deg(u))t,gv, the nbrw 
exhibits a remarkable behaviour, visible on Figure [T] and known as a cutoff: the 
distance to equilibrium remains close to 1 for a rather long time, roughly 


( 10 ) 


t* ■= 


log 


and then abruptly drops to nearly 0 over a much shorter time scal^ of order 

(11) cu* : = 


/ cr^ log N 




Moreover, the cutoff shape inside this window approaches a surprisingly simple 
function <F: M —)■ [0,1], namely the tail distribution of the standard normal: 

1 roo 2 

$(A) := — / e-^du. 

ZTT J A 

It is remarkable that this limit shape does not depend at all on the precise degrees. 


Theorem 1.1 (Cutoff for the nbrw on sparse graphs). For every 0 < e < 1, 

^Mix(g) ~ t-k ^ $“^(£) 

IP 

Equivalently, for t = t^, + Acu* + o(w*) with A G M fixed, we have T>{t) —)■ <h(A). 

1.3. Comments. It is interesting to compare this with the d—regular case (i.e., 
deg: Id —)■ N constant equal to d) studied by Lubetzky and Sly |2Z] : by a remark¬ 
ably precise path counting argument, they establish cutoff within constantly many 
steps around = logiV/log(d — 1). To appreciate the effect of heterogeneous de¬ 
grees, recall that /i and a are the mean and variance of logH, where D is the 
degree of a uniformly sampled half-edge. Now, by Jensen’s Inequality, 

^ ^ log iV 

- \ogE[D] ’ 

and the less concentrated D, the larger the gap. The right-hand side is a well- 
known characteristic length in G, namely the typical inter-point distance (see 
e.g., [SS])- One notable effect of heterogeneous degrees is thus that the mixing 
time becomes signihcantly larger than the natural graph distance. A heuristic 
explanation is as follows: in the regular case, all paths of length t between two 
points are equally likely for the nbrw, and mixing occurs as soon as t is large 
enough for many such paths to exist. In the non-regular case however, different 
paths have very different weights, and most of them actually have a negligible 
chance of being seen by the walk. Consequently, one has to make t larger in order 
to see paths with a “reasonable” weight. Even more remarkable is the impact 
of heterogeneous degrees on the cutoff width cu*, which satisfies cu* >> log log Ad 

^The fact that w* << t* follows from condition (|^. 
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against cu* = 0(1) in the regular case. Finally, the gaussian limit shape $ itself is 
specihc to the non-regular case and is directly related to the fluctuations of degrees 
along a typical trajectory of the nbrw. 


Remark 1.2 (Simple graphs). A classical result by Janson 1^3]/ asserts that the 
graph produced by the configuration model is simple (no loops or multiple edges) 
with probability asymptotically bounded away from 0, as long as 

(12) ^ deg)!.)" = 0(N). 

v£V 


Moreover, conditionally on being simple, it is uniformly distributed over all simple 
graphs with degrees (deg(n)).ug\/■ Thus, every property which holds whp under 
the configuration model also holds whp for the uniform simple graph model. In 
particular, under (12), the conclusion of Theorem 1.1 extends to simple graphs. 


Remark 1.3 (IID degrees). A common setting consists in generating an infinite 
IID degree seguence (deg(n))^gN from some fixed degree distribution Q and then 
restricting it to the index set V = {1,... ,n} for each n > 1. Let D denote a 
random integer with distribution Q. Assuming that 


P (H < 2) = 0, Var(Zl) > 0, and E < oo for some 6 > 0, 


ensures that the conditions &• & and & hold almost surely. Thus, Theorem\l. 1 


applies with the parameters p., a and N now being random. But the latter clearly 
concentrate around their deterministic counterparts, in the following sense: 


N 

L 

a 


nE,[D] -I- Op 

/r*-|-Op with p* 

a.,, + Op (n~ 2 ^ with a\ 


E[01og(0- 1)]/E[0] 


E 


0{log(0-l)-/i42 /E[0] 


Those error terms are small enough to allow one to substitute n, /x*, a* for N, p, a 
without affecting the convergence stated in Theorem 


1.4. Related work. The first instances of the cutoff phenomenon were discovered 
in the early 80’s by Diaconis and Shahshahani na and Aldous [2], in the context 
of card shuffling: given a certain procedure for shuffling a deck of cards, there 
exists a quite precise number of shuffles slightly below which the deck is far from 
being mixed, and slightly above which it is almost completely mixed. The term 
cutoff and the general formalization appeared shortly after, in the seminal paper by 
Aldous and Diaconis [3] . Since then, this remarkable behaviour has been identified 
in a variety of other contexts, see e.g., Diaconis m, Chen and Saloff-Coste m, 
or the survey by Saloff-Coste [31] for random walks on hnite groups. 

Interacting particle systems in statistical mechanics provide a rich class of dy¬ 
namics displaying cutoff. One emblematic example is the stochastic Ising model at 
high enough temperature, for which the cutoff phenomenon has been established 
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successively on the complete graph (Levin et ah [2B]), on lattices (Ding et ah [TB] , 
Lubetzky and Sly [2H]), and finally on any sequence of graphs (Lubetzky and Sly 
[29]). Other examples include the Potts model (Cuff et ah [13]), the East process 
(Ganguly et ah [20]), or the Simple Exclusion process on the cycle (Lacoin [23|). 

The problem of singling out abstract conditions under which the cutoff phe¬ 
nomenon occurs, without necessarily pinpointing its precise location, has drawn 
considerable attention. In 2004, Peres 1311 proposed a simple spectral criterion 
for reversible chains, known as the product condition. Although counter-examples 
have quickly been constructed (see Levin et ah |2Sl Chapter 18] and Chen and 
Saloff-Coste [lOl Section 6]), the condition is widely believed to be sufficient for 
“most” chains. This has already been verified for birth-and-death chains (Ding 
et ah |T7|) and, more generally, for random walks on trees (Basu et ah [S]). The 
latter result relies on a promising characterization of cutoff in terms of the con¬ 
centration of hitting times of “worst” (in some sense) sets. See also Oliveira |3Uj . 
Peres and Sousi [32], Griffiths et al. [21] and Hermon [22] . 

Many natural families of Markov chains are now believed to exhibit cutoff. Yet, 
establishing this phenomenon rigorously requires a very detailed understanding of 
the underlying chain, and often constitutes a challenging task even in situations 
with a high degree of symmetry. The historical case of random walks on the 
symmetric group for example, is still far from being completely understood: see 
Saloff-Coste [33] for a list of open problems, and Berestycki and Sengul [3] for a 
recent proof of one of them. 

Understanding the mixing properties of random walks on sparse random graphs 
constitutes an important theoretical problem, with applications in a wide variety 
of contexts (see e.g., the survey by Cooper m)- A classical result of Broder and 
Shamir [8] states that random d—regular graphs with d fixed are expanders with 
high probability (see also Friedman Hi). In particular, the simple random walk 
(SRw) on such graphs satisfies the product condition, and should therefore exhibit 
cutoff. This long-standing conjecture was confirmed only recently in an impressive 
work by Lubetzky and Sly [2Z], who also determined the precise cutoff window and 
profile. Their result is actually derived from the analysis of the nbrw itself, via 
a clever transfer argument. Interestingly, the mixing time of the SRW is d/{d — 2) 
times larger than that of the nbrw. This confirms the practical advantage of 
NBRW over SRW for efficient network sampling and exploration, and complements 
a well-known spectral comparison for regular expanders due to Alon et al. [3], as 
well as a recent result by Cooper and Frieze [T2] on the cover time of random 
regular graphs. For other ways of speeding up random walks, see Cooper ra¬ 
in the non-regular case however, the tight correspondence between the SRW and 
the NBRW breaks down, and there seems to be no direct way of transferring our 
main result to the SRW. We note that the latter should exhibit cutoff since the 
product condition holds, as can be seen from the fact that the conductance of 
sparse random graphs with a given degree sequence remains bounded away from 0 
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(see Abdullah et al. m)- Confirming this constitutes a challenging open problem. 
In particular, it would be interesting to see whether the SRW still mixes faster than 
the NBRW. To the best of our knowledge, no precise conjectural expression for the 
mixing time of the SRW has been put forwarc^ 


2. Proof outline 

The proof of Theorem O is divided into two (unequal) halves: for 

(13) t = + Ate* + o(tn*), 

with A G M hxed, we will show that 

(14) E[P(f)] > <h(A)-o(l), 

(15) ^ (f) — *h(A) + op(l). 

The lower bound (14) is proved in Section The difficult part is the upper bound 
(15), due to the maximization over all possible initial states. Starting from state 


a; e A, the distance to stationarity is 










since 


(16) 


TT is an involution. Using the symmetry (|^, we may re-write P^{x,'K{y)) as 
P\x,Ti{y)) = Y P*^^{x,u)P^P{y,v)l{^^u)=v}- 

{u,v)£XxX 


As a first approximation, let us assume that the balls of radius t/2 around x and 
y consist of disjoint trees, in agreement with Figure This is made rigorous by a 
particular exposure process described in Section]^ The weight w ( m ) := P^P{x,u) 
(resp. w(u) := P^P{y,v)) can then be unambiguously written as the inverse 
product of degrees along the unique path from x to u (resp. y to v). 

A second approximation consists in eliminating those paths whose weight ex¬ 
ceeds some given threshold 6* > 0 (the correct choice turns out to he 9 ^ P): 

P\x,7r{y)) • 

U,V 

Conditionally on the two trees of height t/2, this is a weighted sum of weakly de¬ 
pendent Bernoulli variables, and the large-weight truncation should prevent it from 
deviating largely from its expectation. We make this argument rigorous in Section 
using Stein’s method of exchangeable pairs. Provided the exposure process did 


^During the finalization of the manuscript, a solution to this problem has been announced [7]. 
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Figure 4. The tree-approximation 


not reveal too many pairs of matched half-edges, the conditional expectation of 
remains close to 1/iV, and we obtain the new approximation 

NP\x, 7r{y)) ^ w(M)w(n)lw(«)w(i,)<e ■ 


Now, the right-hand side corresponds to the quenched probability (conditionally 
on the graph) that the product of the weights seen by two independent nbrw of 
length t/2, one starting from x and the other from y, does not exceed 6. The last 
step consists in approximating those trajectories by independent uniform samples 
from X, which we denote by X ^,..., X^. We obtain 


w{u)w{v)l^(^u)Mv)<9 

U,V 


p-< e 

_deg(Xf) deg(Xt*) 
p r SLi(/^-lQgdeg(X^)) ^ 
a^/t ~ 


yt + log 9 
ay/t 


by the central limit theorem, since 6 ^ 1/N and t ^ + Acu*. Consequently, 


H <h(A), 

as desired. This argument is made rigorous in Sections 0 § and 1^ 


3. The lower bound 

Fix f > 1, two states x,y & X, and a parameter 9 G (0,1). Let Pq^x, y) denote 
the contribution to P^{x,y) from paths having weight less than 9. Note that 
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Pg{x,y) < P*{x,y) if and only if some path of length t from x to y has weight 
larger than 6, implying in particular that P^{x,y) > 6. Thus, 


Summing over aX\ y & X and observing that there can not be more than 1/9 
half-edges y E X satisfying P^{x,y) > 9, we obtain 


1 - E Peix,y) 

ydX 


< 


Pxif) + 


1 


Now, the left-hand side is the quenched probability (i.e., conditional on the un¬ 
derlying pairing) that a nbrw {Xk}o<k<t starting at x satishes 111=1 > 9. 

Taking expectation w.r.t. the pairing, we arrive at 


(17) 


p n 


Li deg(Xfc) 


>9\ < E[V,{t)] + —, 


where the average is now over both the nbrw and the pairing (annealed law). A 
useful property of the uniform pairing is that it can be constructed sequentially, 
the pairs being revealed along the way, as we need them. We exploit this degree of 
freedom to generate the walk {Xk}k>Q and the pairing simultaneously, as follows. 
Initially, all half-edges are unpaired and Xq = x] then at each time k > 1, 

(i) if Xk-i is unpaired, we pair it with a uniformly chosen other unpaired half¬ 
edge; otherwise, n{Xk-i) is already dehned and no new pair is formed. 

(ii) in both cases, we let Xk be a uniformly chosen neighbour of 7r(Xfc_i). 

The sequence {Xk}k>o is then exactly distributed according to the annealed law. 
Now, if we sample uniformly from X instead of restricting the random choice made 
at (i) to unpaired half-edges, the uniform neighbour chosen at step (ii) also has 
the uniform law on X. This creates a coupling between the process {Xk}k>i and 
a sequence {X^}k>i of IID samples from X, valid until the hrst time T where the 
uniformly chosen half-edge or its uniformly chosen neighbour is already paired. As 
there are less than 2k paired half-edges by step k, a crude union-bound yields 

op 

P(T<t)<^. 

Consequently, 


(18) 



1 

deg(Xfc) 




1 

deg(A:^) 



< 


2P 
Iv ■ 


On the other hand, since {WJ",... ,X/} are IID, Berry-Esseen’s inequality implies 



1 

deg(X^) 



/ fit + log9\ 
[ aVi ) 


Q 

a^y/i 


( 19 ) 


< 
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We may now combine (17), (18), (19) to obtain 




'^t + \og6\ 1 

aVi ) 


a- 


''Vi' 


With t as in (13) and 6 = {\ogN)/N, the right-hand side is ‘h(A) -|- o(l), thanks 
to our assumptions on /i, a, g. This establishes the lower bound 0- 

4. The upper-bound 

Following Lubetzky and Sly 123 , we call X E X a, root (written x E TV) if the 
(directed) ball of radius h centered at x (denoted by Bx) is a tree, where 

logiV 


( 20 ) 


h := 


A log log N 


10 log A 

Note that 1 << h << a;* by assumptions ^ and ([^. The hrst proposition below 
shows that we may restrict our attention to paths between roots. The second 
proposition provides a good control on such paths. 


Proposition 4.1 (Roots are quickly reached). 

max P^ix, X \ TV) 

x&X ^ ^ 


^ 0 . 


Proposition 4.2 (Roots are well inter-connected). Fort as in (13), 

<F(A) — op(l) 


min min P^{x,7i{y)) > 

xG'JZ y£7V\Bx 


N 


Let us hrst see how those results imply the upper-bound (15). Observe that 
Vit + h) < max P'^ (x, X \Tl) + TxiaxVxit) . 

x&x x&n 

The hrst term is op(l) by Proposition 4.1[ For the second one, we write 


rX) = E 

y£Tl\Bx 


1 

N 


P\x,Ti{y)) 


+ E 


1 

N 


P\x,TT{y)) 


Proposition 4.2 ensures that the hrst sum is bounded by *h(A) -|- op(l) uniformly 
in X G 77. To see that the second sum is op(l) uniformly in x G 77, it suffices to 
bound its summands by 1/iV and observe that \Bx\ < = o{N) by (20), while 

|A’\77| = 5:f"(x,T'\77), 

x&X 


(P is doubly stochastic), which is op(A) uniformly in X by Proposition 4.1 

log N 


Proof of Proposition \4.1\ Dehne r := 
around x can be generated sequential 


5 log A 

y, its 


and fix X E X. The ball of radius r 
lalf-edges being paired one after the 


other with uniformly chosen other unpaired half-edges, until the whole ball has 
been paired. Observe that at most k = pairs are formed. Moreover, 
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for each of them, the number of unpaired half-edges having an already paired 
neighbour is at most A(A — 1)^ and hence the conditional chance of hitting such an 
half-edge (thereby creating a cycle) is at most p = . Thus, the probability 

that more than one cycle is found is at most 


{kpf 


O 





Summing over sX\ x & X (union bound), we obtain that with high probability, no 
ball of radius r in G{'k) contains more than one cycle. 

To conclude the proof, we now fix a pairing vr with the above property, and we 
prove that the nbrw on G(vr) starting from any x ^ X satisfies 

(21) P {Xt is not a root) < 2^“*, 


for all f < r — h. The claim is trivial if the ball of radius r around x is acyclic. 
Otherwise, it contains a single cycle C, by assumption. Write d{z, C) for the 
minimum length of a non-backtracking path from x to some z G C. The non¬ 
backtracking property ensures that if d(Xt, C) < d{Xt+i, C) for some t < r — h, then 
At+i, Xt+2, • • • 5 Xr-h are all roots. By (|^, the conditional chance that d(At+i, C) = 
d{Xt,C) + 1 given the past is at least 1/2 (unless d{Xt,C) = 1, which can only 
happen once). This shows (21). We then specialize to t = h. □ 


5. The exposure process 


The remainder of the paper is devoted to the proof of Proposition 4A 
distinct half-edges x, y 


Fix two 

G X. We describe a two-stage procedure that generates 
a uniform pairing on X together with a rooted forest 5 keeping track of certain 
paths from x and y. Initially, all half-edges are unpaired and ^ is reduced to its 
two roots, X and y. We then iterate the following three steps: 


1. An unpaired half-edge z G 5 is selected according to some rule (see below). 

2. z is paired with a uniformly chosen other unpaired half-edge z'. 

3. If neither z' nor any of its neighbours was already in (5^, then all neighbours 
of z' become children of 2 ; in the forest 


The exploration stage stops when no unpaired half-edge is compatible with the se¬ 
lection rule. We then complete the pairing by matching all the remaining unpaired 
half-edges uniformly at random: this is the completion stage. 

The condition in step 3 ensures that ^ remains a forest: any z E ^ determines 
a unique sequence (^o, ■ ■ ■ ,Zh) in (5^ such that zo is a root, Zi is a child of Zi-i for 
each 1 < i < h, and Zh = z. We shall naturally refer to h and zq as the height and 
root of respectively. We also define the weight of 2 ; as 


h 


Mz) n 

2 = 1 


1 

deg(2:i) ■ 
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Note that this quantity is the quenched probability that the sequence [zq, ... ,Zh) 
is realized by a nbrw on G starting from zq. In particular, 

(22) w( 2 ;) < P^{zo,z). 

Our rule for step 1 consists in selecting an half-edge with maximal weighij^ among 
all unpaired z E ^ with height h( 2 ;) < t/2 and weight w( 2 :) > Wmin, where 

ITmIN '■= N 3 . 


The only role of this parameter is to limitate the number of pairs formed during 
the exploration stage. As outlined in Section we shall be interested in 

22J ;= w(m)w(u)1w(h)w(,;)<0, 

{u,v)&'HxX'Hy 

where Tix (resp. T-Ly) denotes the set of unpaired half-edges with height | and root 
X (resp. y) in at the end of the exploration stage, and where 

^ AT (log AT) 2- 

Write 2IJ for the quantity obtained by replacing < with > in 2IJ, so that 

2IJ-I-2IJ = ^ w{u)w{v) 

{u,v)&'H.xX'Hy 

> w(z)-i, 

Z&HxLI'Hy 

thanks to the inequality ab > a + b — 1 for a,b E [0,1]. Now, let if denote the set 
of unpaired half-edges in By construction, at the end of the exploration stage, 
each z E ii must have height t/2 or weight less than Wmin, so that 


2IJ + 2IJ > ^ w(2;) - ^w(^)l(w(^)<^„,,) 

z£il zSS 


- 1 . 


Therefore, Proposition 4.2 follows from the following four technical lemmas. 


Lemma 5.1. For every £ > 0, 

= "(](?)■ 

Lemma 5.2. For every e > 0, 

Lemma 5.3. For every e > 0, 

P (33 >f(A) + £) = o(T), 

^For definiteness, let us say that we use the lexicographic order on X to break ties. 
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Lemma 5.4. For every e > 0, 


P 



e'JZ,y eF-X 

z&iX 




6. Proof of Lemma IsTTI 


Combining the representation (16) with the observation (22) yields 


P\x,7l{y)) > Yj w(M)w(T)lw(«)w(i,)<el7rH=i,- 

{u,v)G'HxX'Hy 

The right-hand side can be interpreted as the weight of the uniform pairing chosen 
at the completion stage, provided we dehne the weight of a pair (u, v) as 

(24) w('u) w('n) lw(it)w(i))<6- 

Lemma |5.1 now follows from the following general concentration inequality, which 
we apply conditionally on the exploration stage, with X being the set of half-edges 


that did not get paired, and weights being given by (24). 


Lemma 6.1. LetX be an even set, o,n array of non-negative weights, 

and TT a uniform random pairing on X. Then for all a > 0, 

\ f aM 

P Y < 1 ^- a] < exp 


\iei 


4:9m 


where m = and 9 = maxj^j(wjj -f- Wj^f). 


Note that in our case, m = 


w 

\X\-1 


. Lemma 


and observing that |X| — 1 < and QU < 1. 


5.1 


follows easily by taking a = 


\x\-i 


Proof. We exploit the following concentration result for Stein pairs due to Chatter- 
jee [S] (see also Ross [231 Theorem 7.4]): let Y,Y' be bounded variables satisfying 

(i) (F,r)^(r,F); 

(ii) E[Y' - Y\Y] = -\Y; 

(hi) E[(W-F)2|F] < X{bY + c), 

for some constants A G (0,1) and 6, c > 0. Then for all a > 0, 

P(F <-a) < expj- — I and P (F > a) < exp |-^— 

[ c J [ ao -|- c 

We shall only use the hrst inequality. Consider the centered variable 

Y := ~ 

i&X 

and let Y' be the corresponding quantity for the pairing vr' obtained from vr by 
performing a random switch: two indices i,j are sampled uniformly at random 
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from X without replacement, and the pairs {i,7i{i)}, are replaced with 

the pairs {7r(f), 7r(j)}. This changes the weight by exactly 

(25) ^i,j • ^i,j T ^j,i d" T 


It is not hard to see that (vr,7r') = (7r',7r), so that (i) holds. Moreover, 

1 


E[F'-F|7r] = 




1^1 

Regarding the square = |Ajj||Ajj|, we may bound the hrst copy of |Aij| by 
26 and the second by changing all minus signs to plus signs in (25), yielding 

^_ N N A-. . 


E 


{¥' - y)V 


- Wa^. 


< 


89 






w. 


i,7r(i) 


i&X 


80 


= |^(2m + F). 


Note that taking conditional expectation with respect to Y does not affect the 
right-hand side. Thus, (ii) and (iii) hold with A = jlj, b = 26 and c = 4m6. □ 


7. Proof of Lemma 15.21 


We may £x zq G {x, y} and restrict our attention to the halved sum 

Z ■— E/ ^(^)1(w(2)<'U)min) 1(2 has root zq)' 


Consider m = [logAj independent nbrws on G{7r) starting at zo, each being 
killed as soon as its weight falls below Wmin, and write A for the event that their 
trajectories form a tree of height less than t/2. Clearly, P(z4|7r) > Z^. Taking 
expectation and using Markov inequality, we deduce that 


¥{Z>e) < 


P(A) 


where the average is now taken over both the walks and the pairing. Recalling 
that m = [log A], it is more than enough to establish that P(A) = (o(l))”^. To do 
so, we generate the m killed nbrws one after this other, revealing the underlying 
pairs along the way, as described in Section Given that the first i — 1 walks 
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form a tree of height less than t/2, the conditional chance that the walk also 
fnlhls the reqnirement is o(l), nniformly in 1 < ^ < m. Indeed, 

• either its weight falls below rj = (1/log iV)^ before it ever reaches an nn- 
paired half-edge: thanks to the tree structnre, there are at most ^—l<m 
possible trajectories to follow, so the chance is less than 

mr] = o(l). 


• or the remainder of its trajectory after the hrst nnpaired half-edge has 
weight less than AtCMiN/h- I'his part consists of at most t/2 half-edges 
which can be conpled with nniform samples from X for a total-variation 
cost of mt'^/N, as in Section]^ Thns, the conditional chance is at most 


~w 


+ p 


't/2 

n > 

I k=l 


Aw. 


0 ( 1 ), 


by Chebychev’s ineqnality, since 



-f »<T 



8. Proof of Lemma 15.31 

Set m = [log N]. On G(7r), let ..., and ..., be 2m inde¬ 
pendent NBRWs of length t/2 starting at x and y respectively. Let B denote the 
event that their trajectories form two disjoint trees and that for all 1 < fc < m, 

t/2 ^ t/2 ^ 

M deg(xf M deg(x/^^) 

Then clearly, P {B\tt) > Q2J . Averaging w.r.t. the pairing tt, we see that 

P(OT><h(A)+£) < , . 

Thus, it is enough to establish that P(i?) < (<h(A) -|- o(l))™'. We do so by gener¬ 
ating the 2m walks X^^^ ..., X^'^\ one after the other along with the 

underlying pairing, as above. Given that X*^^\ ..., yh-i) already sat¬ 

isfy the desired property, the conditional chance that X^^\ Y^^'^ also does is at most 
$(A) -|- o(l), uniformly in 1 < £ < m. Indeed, 

• either one of the two walks attains length s = [2 log log Xj before reaching 
an unpaired half-edge: there are at most I — 1 < m possible trajectories 
to follow for each walk, so the conditional chance is at most 

2m2-* = o(l). 

• or at least t — 2s unpaired half-edges are encountered, and the product of 
their degrees falls below | with conditional probability at most 

^ + p(ndegW)<^j = 4(A)+ 0(1), 
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by the same coupling as above and Berry-Essen’s inequality (19). 


9. Proof of Lemma 15.41 


We denote by r the (random) number of pairs formed during the exploration 
stage. For A; > 0, we let ilfc denote the set of unpaired half-edges in the forest after 
/c A r pairs have been formed, and we consider the random variable 

Wk := 

zeUk 

Initially Wq = 2, and this quantity either stays constant of decreases at each stage, 
depending on whether the condition appearing in step 3 is satisfied or not. More 
precisely, denoting by Zk (resp. 2 ;^) the half-edge selected at step 1 (resp. chosen 
at step 2) of the pair, we have for all A; > 1, 



where il^.i is the union of and the set of unpaired neighbours of the roots. 
Now, let {Gk}k>o be the natural hltration associated with the exploration stage. 
Note that r is a stopping time, that w{zk) is measurable, and that the 

conditional law of given Qk-i is uniform on df \ {zi,..., Zk, z[..., z'j^_i\. Thus, 


¥.[Wk-Wk-i\Gk-i] 


E 


{Wk-Wk-if\Gk-i 


w(z,)(|ilti|-2) + W,_i 
(k<r) ^ _ 2k + I 

w(2;fc)2(|il+_^| - 4) -f 2w{zk)Wk-i + 

_ on _L 1 


To bound those quantities, observe that each half-edge in has weight at least 
because its parent has been selected at an earlier iteration and our selection 
rule ensures that the quantity w( 2 ;fc) is non-increasing with k. Consequently, 




^{Zk) 

A 


< i: wn)<2. 


Combining this with the bound |il^_i| < + 2A, we arrive at 

4A 


lE[hLfc — Wk-l\Qk-\\ > “l(A:<r) 


E 


{Wk-Wk-if\Gk-^ 




N -2k+ 1 
A Aw (zk) + 2 
N -2k+ 1 ■ 


Now recall that w{zk) > Wmin and h.{zk) < | as per our selection rule, implying 




< 5]w(2;fc)l(,>fc) < Ew(^)lfh(A<l') 


fc>l 
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The right-most inequality follows from the fact that the total weight at a given 
height in ^ is at most 2 (the total weight being preserved from a parent to its 
children, if any). We conclude that 


J2E[Wk-Wk-i\gk-i 

k=l 


k=l 


AAt 

WminAI 2t 

AAtwMw + 2t 
N ICmin 2t 


Now, fix £ > 0 and consider the martingale {Mfc}fc>o defined by Mq = 0 and 


Mk: = 


T. - 


Wi)Ae-E {Wi- 


Wi)Ae 



i=l 

Then the increments of {Mfc}fc>o are bounded by £ by construction, and the above 
computation guarantees that almost-surely. 


k=l 


E 


{Mu-Mu-if gk-i] < n = 


Thus, the martingale version of Bennett’s inequality due to Freedman ng yields 

(26) P(M, >7e) < = Ar-i+°W. 

But on the event {a: G 77, 1 / G 77 \ all paths from the set {x, y} to itself must 
have length at least h, and since h -A 00 , we must have asymptotically 


{x G 77,1/ G 77 \ C |max(lFfc_i - Wk) < ej 

C {Wo - Wr<Mr + m} 


With (26), this proves Lemma 5.4 since Wo — Wr = 2 — ^ 
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